Garage distributed object storage via Ansible

I recently build a beginner-friendly ansible playbook for Garage, a S3 compatible distributed object storage.

What is garage-docker-ansible-deploy?

Garage is an open-source distributed object storage service tailored for self-hosting. The ansible playbook garage-docker-ansible-deploy helps you to set up such a cluster.

It comes with “batteries included” so it will automatically install docker and set up a reverse proxy (traefik).

You may be familiar with some related ansible playbooks that this playbook is based on

These playbooks are masterfully maintained by spantaleev and community. I copied the design und re-use roles e.g. to install traefik.

Opinionated Design

Garage is a very flexible software that can server a lot of use-cases. The playbook is opinionated in the sense that it reduces the flexibility of garage in favor of an easy deployment that should serve common use cases. The playbook currently encourages a layout where

  • 1 garage data node is used per physical drive that should be used by the cluster
  • 1 gateway node is used per host to make redundant setups possible

Each host is assumed to habe a public IPv4/IPv6 address and every node should have a dedicated subdomain + one subdomain per gateway on the host.

When all of this comes together a garage host might look something like this

A garage node with one gatway node and 2 data nodes that expose the ports 3901, 3911 and 3912. A trafik server exposes port 443. Everything is contained within server1 that has IP 42.42.42.42

Example layout with one host that has 2 nodes (as it has two drives where data will be stored)

The playbook will need you to configure the DNS records to point to server1 and will make everything else happen with the following configuration.

garage_garage_node1_base_path: "/media/drive1/garage/node1"
garage_garage_node2_base_path: "/media/drive2/garage/node2"
garage_garage_nodes:
    - name: "gateway1"
        metadata_path: "{{ garage_garage_meta_path }}/gw1"
        data_path: "{{ garage_garage_data_path }}/gw1"
        gateway: true
        rpc_bind_port: 3901
        node_addr: "garage-gw1.example.com"
        s3_api_addr: "s3.example.com"
    - name: "node1"
        gateway: false
        capacity: 3
        metadata_path: "{{ garage_garage_node1_base_path }}/metadata"
        data_path: "{{ garage_garage_node1_base_path }}/data"
        rpc_bind_port: 3911
        node_addr: "garage-node1.example.com"
    - name: "node2"
        gateway: false
        capacity: 3
        metadata_path: "{{ garage_garage_node2_base_path }}/metadata"
        data_path: "{{ garage_garage_node2_base_path }}/data"
        rpc_bind_port: 3921
        node_addr: "garage-node2.example.com"

Limitations

While the playbook should of course be reusable and fairly modular it will never be a solution to all use cases. The playbook does not cover

  • Setting up domains (but there are instructions)
  • Detailed management of the buckets and keys: There are basic features to create buckets and access keys but management will not be in the scope of the playbook
  • connecting nodes via (mesh) VPN as metioned in the project documentation

Getting started

Be aware that the playbook is not yet used widely so I don’t have much more than my own experiences. I am happy to help if you experience bumps in the road

Student of Medical Informatics, Developer, He/Him